L14 Tables 1

Data Visualization (STAT 302)

Author

Bud Green

Published

May 22, 2024

Overview

The goal of this lab is to explore static tables using the grammar of tables from the gt and gtExtras package.

The gt() function creates a gt table object when provided with table data. Using this function is the first step in a typical gt workflow. Once we have the gt table object, we can perform styling transformations by adding on column, formatting, tab styling, and summary layers.

A few helpful links to provide more information about the gt package: gt rstudio and intro to gt

Posit (formally RStudio) hosts an annual table competition! Check out some past winning tables: 2021 2022

Dataset

We will be using the pizzaplace dataset included in the gt package, the house_prices, stream_data, and team_logo dataset which should be located in your /data subdirectory. Each dataset will require a bit of data wrangling provided in their respective exercise code chunk.

# load package(s)
library(tidyverse)
library(gt)
library(gtExtras)

# load data
# exists w/ gt library - data() puts it into environment
data("pizzaplace")
# load paces and stores rda file in environment
load("data/house_prices.rda")
# reads data, the arrow stores it in environment
stream_data <- read_csv("data/stream_data.csv", show_col_types = FALSE) |> 
  janitor::clean_names()
# similar to csv
team_logo <- read_rds("data/team_logo.rds") |> 
  janitor::clean_names()

Exercise 1

Use the pizza_summary dataset created below to recreate the following graphic as precisely as possible. We will work through this exercise together to provide a high level overview of a few of the ways you can layer on table elements to make a simple but visually appealing table.

Hints:

  • md(): Interpret input text as Markdown-formatted text
Code
# data wrangling
pizza_summary <- pizzaplace |> 
  filter(type %in% c("classic", "chicken")) |> 
  group_by(name, type) |> 
  summarize(
    number_sold = n(),
    profit = sum(price)
    ) |> 
  ungroup()
Solution
Code
# build table
pizza_summary |> 
  gt(
    rowname_col = "name",
    groupname_col = "type"
    
  ) |> 
  tab_header(
    title = md("**Pizza Place Annual Pizza Sales**"),
    subtitle = "Fiscal Year 2015"
  ) |> 
  tab_source_note(
    source_note = md("Source: *gt* package dataset")
  ) |> 
  cols_label(
    # variable = "string"
    number_sold = md("Number<br>Sold"),
    profit = md("Annual<br>Revenue")
  ) |> 
  # reorders rows, not rename
  row_group_order(groups = c("classic", "chicken")) |> 
  fmt_currency(
    columns = profit,
    currency = "USD",
    decimals = 2
  ) |> 
  fmt_number(
    columns = number_sold,
    decimals = 0
  ) |> 
  tab_style_by_grp(
    column = profit,
    fn = max,
    cell_fill(color = "red", alpha = 0.25)
  ) |> 
  tab_style_by_grp(
    column = number_sold,
    fn = max,
    cell_fill(color = "blue", alpha = 0.25)
  ) |> 
  tab_footnote(
    footnote = "Our most popular pizza",
    locations = cells_stub("classic_dlx")
  ) |> 
  tab_options(
    row_group.background.color = "gray40",
    stub.border.style = "dashed"
  ) |> 
  opt_footnote_marks(
    marks = "standard"
  ) |> 
  opt_table_outline()
Pizza Place Annual Pizza Sales
Fiscal Year 2015
Number
Sold
Annual
Revenue
classic
big_meat 1,914 $22,968.00
classic_dlx* 2,453 $38,180.50
hawaiian 2,422 $32,273.25
ital_cpcllo 1,438 $25,094.00
napolitana 1,464 $24,087.00
pep_msh_pep 1,359 $18,834.50
pepperoni 2,418 $30,161.75
the_greek 1,420 $28,454.10
chicken
bbq_ckn 2,432 $42,768.00
cali_ckn 2,370 $41,409.50
ckn_alfredo 987 $16,900.25
ckn_pesto 973 $16,701.75
southw_ckn 1,917 $34,705.75
thai_ckn 2,371 $43,434.25
Source: gt package dataset
* Our most popular pizza

Exercise 2

Using the pizza_size dataset derived from the pizzaplace dataset, recreate the following graphic as precisely as possible.

Code
# data wrangling
pizza_size <- pizzaplace |> 
  filter(
    size %in% c("S", "L", "M"), 
    type %in% c("chicken", "classic", "veggie")
    ) |> 
  group_by(type, size) |> 
  summarize(
    number_sold = n(),
    profit = sum(price)
    ) |> 
  ungroup() |> 
  rename(category = type)

Table 1

Hints:

  • tab_style() is useful for changing cell colors
  • Hex codes: "#F8766D", "#00BA38", "#619CFF"
  • size L: bold
  • size M: italic
  • size S: "white"
Solution
Code
base_pizza <- pizza_size |> 
  gt(
    # what each row w/in a group will be
    rowname_col = "size",
    # the groups that rowname_col will be for each
    groupname_col = "category"
  ) |> 
  tab_header(
    title = md("**Pizza Place Annual Pizza Sales**"),
    subtitle = "Fiscal Year 2015"
  )
Code
base_pizza |> 
  # have to do a different tab_style for each row
  tab_style(
    style = list(
      cell_fill(color = "#F8766D"),
      cell_text(color = "white")),
    locations = cells_body(rows = size =="S")
  ) |> 
  tab_style(
    style = list(
      cell_fill(color = "#00BA38"),
      cell_text(color = "black", style = "italic")),
    locations = cells_body(rows = size =="M")
  ) |> 
  tab_style(
    style = list(
      cell_fill(color = "#619CFF"),
      cell_text(color = "black", weight = "bold")),
    locations = cells_body(rows = size =="L")
  )
Pizza Place Annual Pizza Sales
Fiscal Year 2015
number_sold profit
chicken
L 4932 102339.00
M 3894 65224.50
S 2224 28356.00
classic
L 4057 74518.50
M 4112 60581.75
S 6139 69870.25
veggie
L 5403 104202.70
M 3583 57101.00
S 2663 32386.75

Table 2

Hints:

  • gt_color_rows is useful for changing entire column colors
  • manual colors c("white", "skyblue")
  • Pastel1 palette from RColorBrewer package
Solution
Code
base_pizza |> 
  gt_color_rows(
    columns = number_sold,
    palette = RColorBrewer::brewer.pal(n = 9, name = "Pastel1")
  ) |> 
  gt_color_rows(
    columns = profit,
    palette = c("white", "skyblue")
  )
Pizza Place Annual Pizza Sales
Fiscal Year 2015
number_sold profit
chicken
L 4932 102339.00
M 3894 65224.50
S 2224 28356.00
classic
L 4057 74518.50
M 4112 60581.75
S 6139 69870.25
veggie
L 5403 104202.70
M 3583 57101.00
S 2663 32386.75

Exercise 3

Using the house_illinois dataset derived from the house_price dataset, recreate the following graphic as precisely as possible.

Similar to the ggplot scales_*() function, the gt package has useful transformations that can be performed using the following syntax tab_*(), fmt_*(), cols_*(). Check the package documentation for a full list of options.

Code
# data wrangling
house_illinois <- house_prices |> 
  filter(state %in% c("Illinois")) |> 
  mutate(year = year(date)) |> 
  filter(year %in% c(2005:2010) ) |> 
  group_by(year) |> 
  summarize(
    hpi = mean(house_price_index),
    change = mean(house_price_perc)
    )

Hints:

  • color is "red" for all percents less than 0
  • percent is accounting
  • currency "USD"
  • "Housing Price Index" label is a tab_*() layer
Solution
Code
house_illinois |> 
  gt(
    # this removes the label of the column
    rowname_col = "year"
    ) |> 
  tab_header(
    title = "Average Annual Housing Price Index (HPI)",
    subtitle = md("*Illinois Prices for 2006-2010*")
  ) |> 
  tab_style(
    style = cell_text(color = "red"),
    location = cells_body(
      columns = change,
      # this is the check for if it's red or not
      rows = change < 0
    )
  ) |> 
  fmt_currency(
    columns = hpi,
    currency = "USD",
    decimals = 2
  ) |> 
  fmt_percent(
    columns = change,
    decimals = 2,
    # gives the same number of significant digits
    scale_values = TRUE,
    # commas and () for negative
    accounting = TRUE
  ) |> 
  tab_spanner(
    # the name of the new column group
    label = "Housing Price Index",
    # the columns the group is made for
    columns = c(hpi, change)
  ) |> 
  cols_label(
    hpi = "Value",
    change = "% Change"
  ) |> 
  tab_footnote(
    footnote = "Housing market crash",
    locations = cells_stub(rows = "2008")
  )
Average Annual Housing Price Index (HPI)
Illinois Prices for 2006-2010
Housing Price Index
Value % Change
2005 $138.39 7.81%
2006 $145.69 5.31%
2007 $145.54 (0.09%)
1 2008 $135.06 (7.22%)
2009 $122.30 (9.40%)
2010 $116.27 (4.93%)
1 Housing market crash

Exercise 4

Adding icons and graphics to a table is made easy with gtExtras. Using the stream_data dataset, recreate the following graphic as precisely as possible.

Hints:

  • gt_plt_bullet()
  • gt_fa_column()
  • nytimes theme()
  • fmt_symbol_first()
Solution
Code
# pipe the following code into your gt functions
stream_data |> 
  mutate(
    type = map(type, fontawesome::fa)
  ) |>
  gt() |> 
  gt_theme_nytimes() |> 
  gt_plt_bullet(
    # column is the bar
    column = nominee,
    # target is the actual bullet value that it would put on top of nominee
    target = winner,
    # pallette = c(bar color, bullet color). default is grey and red
    palette = c("grey", "blue")
  ) |> 
  fmt_symbol_first(
    column = ratio,
    suffix = "%",
    decimals = 1,
    scale_by = 100
  ) |> 
  tab_header(
    title = md("**Netflix leads Emmy Nominations & Wins:<br>2020-2021**"),
    subtitle = md("but **HBO** leads in their Wins to Nominations ratio")
  ) |> 
  gt_highlight_rows(
    # this tells which variable it looks through, then it looks through distributor for HBO and highlights that
    rows = distributor == "HBO",
    fill = "grey",
    alpha = 0.4
  ) |> 
  gt_add_divider(
    columns = type,
    sides = "right",
    color = "lightgrey",
    weight = px(1)
  ) |> 
  cols_width(
    distributor ~ px(100),
    type ~ px(40),
    ratio ~ px(60),
    nominee ~ px(230)
  )
Netflix leads Emmy Nominations & Wins:
2020-2021
but HBO leads in their Wins to Nominations ratio
distributor type ratio nominee
Netflix 27.7%
HBO 28.8%
NBC 20.5%
ABC 16.0%
Hulu 2.0%
FX Networks 11.1%
Prime Video 8.9%
Apple TV+ 23.8%
CBS 2.5%

Exercise 5

Using the team_logo dataset, recreate the following graphic as precisely as possible.

The tab_options() layer for gt is similar to the theme() option for ggplot. Here you can add customization to the background colors, line type, size, etc.

Hints:

  • gt_img_rows with height = 30
  • tab options:
    • data row padding is px(1)
    • table background color is "gray90"
    • table width is 100%
    • table font color is "red"
    • column labels have background color "black", font size "large", and font weight "bold",
    • table body has hlines color "red"
    • table body has border bottom color "red"
Solution
Code
# pipe the following code into your gt functions
team_logo |>
  dplyr::select(logo = team_logo_espn, team_name, team_nick, team_conf) |>
  slice_sample(n = 2) |> 
  gt() |> 
  gt_img_rows(
    columns = logo,
    img_source = "web",
    height = 30
  ) |> 
  tab_options(
    data_row.padding = px(1),
    table.width = "100%",
    table.background.color = "gray90",
    table.font.color = "red",
    column_labels.background.color = "black",
    column_labels.font.size = "large",
    column_labels.font.weight = "bold",
    table_body.hlines.color = "red",
    table_body.border.bottom.color =  "red"
  )
team_name team_nick team_conf
New England Patriots Patriots AFC
New York Giants Giants NFC